This is a project to look at transcription in the growth cone vs. the soma, initiated by Alexandros Poulopoulos from the Macklis lab. The experimental setup is a little complex, there were only a small number of actual biological replicates run but several technical replicates. The technical replicates come in two flavors. The first is that the samples were run multiple times, there was some overclustering problems on the lanes that lead to a low number of reads, so the libraries were rerun. The second is that a small set of the samples only had half of the sample loaded, the idea was to look at the quantitative range.
There are paired growthcone-soma samples that are paired via sort date, and there are three sort dates, so there are three biological replicates. We will block on sort date in the model.
We’ll simply add the counts together for the reruns of the samples if they look to be highly correlated and don’t seem to have a batch effect. For the half loaded samples, we only have half amplified samples for the growthcone, and only from two two dates, Feb15 and Jun15. I don’t really know how we can use this information other than to just compare it to the non-half loaded samples and see how well the correlation is. We’ll probably drop those when we do the analysis.
We can see here that there is quite a bit of variation between the mapped reads that persists across the 3 different runs. BC9 in particular has a very low number of reads mapped.
Mapping rate is a better metric of if there is something wrong with the libraries. Here we see that BC14 and BC9 look pretty bad for their mapping rate but overall the mapping rate is lower than we would expect, we’d normally be expecting about 90% of the reads to map. This could be due to some kind of adapter contamination issues we are not addressing when making the samples. What kit was used to make these libraries? What adapter sequences can we expect to see on the end of the sequences?
We can see from the above plot that in general the growthcone libraries have a poorer mapping rate than the non-growthcone libraries.
We detect on average less genes in the growthcone libraries, this might make sense if there are only a subset of the genes in the growthcones. This would be a more convincing argument if there was not one growthcone sample BC10 that has a high number of genes deteceted.
This plot is to help us figure out if we are saturating the number of genes robustly detected in sequencing. Here we called detected as having counts > 10, rather than counts > 0. It looks like both the growthcone and the soma libraries cap out at the same rate, which is not what we would be expecting if the growthcone libraries were a smaller set of RNA than the soma libraries.
Another way to determine what went wrong with the sequencing is to look at the rate mapped reads map to exons. Here we see that for BC9 of the reads that map, a small amount map to exons. This, coupled with the low mapping rate generally indicates there was very little RNA in this sample and most of what got sequenced was either adapter sequence or genomic DNA.
rRNA mapping rate is reasonable for all the samples. Again we see BC9 with a very small rate, since it seems like whatever is mapping in there is not from RNA.
We can see more problems with BC9– the fragment size is very large, that means we likely had large pieces of DNA leftover that the libraries were made from. We can also see some problems with BC14, the estimated fragment length is negative. This means that the average fragment length is less than the read length, which is an indication of degraded RNA. That might be why this sample looks a little worse than the others.
When we look at BC9, about 40% of the sequence is Truseq adapter sequence. We trim this off during our processing, so these usually indicate stacks of Truseq sequence when mulitple copies are stacked together. The BC14 samples, about 10% is is taken up by the Truseq adapater. Good libraries like BC10 don’t have the sequence.
All of these lines of evidence point towards BC9 not having RNA in the sample and BC14 having degraded RNA.
Trimmed mean of M-values (TMM) normalization is described here
Robinson, M. D., & Oshlack, A. (2010). A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biology, 11(3). doi:10.1186/gb-2010-11-3-r25
Normalizing the counts by TMM-normalization can’t full normalize the samples.
Here we can see that the BC9 and BC14 libraries cluster together. These are likely because these libraries both have issues with adapter contamination that are symtpoms of having a low RNA content. Dropping these samples seems prudent. The BC6 sample also looks like it is an outlier from the other samples in the PCA plot, but it is different than the BC9 and BC14 samples. The BC6 sample was the other sample with a very low number of genes detected. This sample doesn’t have the same adapter contamination issue, but the GC content looks out of whack with a huge spike in the mean GC content instead of an even distribution.
The BC6 library looks like it is less complex than the other libraries, when we look at the sequence duplication plots. As an extreme example, lets look a the duplication plot for BC9:
BC9 was mostly adapter sequence and we can see there are reads that are duplicated over 10k times. For an example of what a good library looks like, lets look at BC10:
It is normal to see some duplication in RNA-seq just because there is only 2% of the genome that is sequenced, so you should see some duplication, just not sequences with a crazy amount.
BC6 has a big spike:
A sample that was like BC6 in terms of reads mapped and mapping percentage is BC5. That sample doesn’t have as much of a prominent spike:
These are all growthcone libraries. So the question is, is this sample what we are expecting to see when we look at the growthcones, a low number of genes detected? Or are the other samples where we see a high number of genes detected what we expect. I think that the BC6 library probably also had low RNA content, just not as low as BC9 and BC14, but it is still problematic. So going forward I’m going to drop BC9, BC14 and BC6 from the analysis. Hopefully that leaves us with enough data to do something with.
The BC6 and BC5 samples are the two samples that are in the half loaded group, so I think the takehome is don’t do that in the future.
There is some variability by sort date. The Feb15 samples separate out from the other samples along the 2nd principal component by sort date:
Unfortunately the growthcone sample from Feb15, the only sample that survived is BC5, which is the last remaining half loaded sample, so we might be making some apples to oranges type comparison with that one.
We can see the amount of starting material has an effect for the Feb15 soma samples but not so much for the Oct14 samples. We can just drop the samples with starting material = 50 for those.
To recap we’re left with samples that separate like this on the PCA:
Each of those are groups of three from each run; the runs almost perfectly overlap so it is safe to combine them into one set of counts for each library. We’ll construct sample summary information and a count matrix of those combined samples. For the counts we’ll just add the counts together for each run.
That leaves us with this for the PCA:
and samples that look like this:
| barcode | barcode_id | compartment | half_amplified | sort_date | Name | |
|---|---|---|---|---|---|---|
| BC10 | GGAGAA | BC10 | growthcone | no | Oct14 | BC10 |
| BC12 | GAGTCA | BC12 | soma | no | Oct14 | BC12 |
| BC16 | TTGGCA | BC16 | soma | no | Feb15 | BC16 |
| BC5 | ACCTCA | BC5 | growthcone | yes | Feb15 | BC5 |
| BC7 | AAGCCT | BC7 | growthcone | no | Jun15 | BC7 |
| BC8 | GTCGTA | BC8 | soma | no | Jun15 | BC8 |
We’re a little stuck because one of the Feb15 samples is the half amplified sample so we’re comparing apples to oranges a little bit there.
For diffential expression we’ll drop all genes that have aren’t expressed in any of the samples, and analyze them with DESeq2, running a paired analysis looking at the effect of compartment, doing a paired analysis comparing only samples sorted on the same date to each other.
TableGrob (2 x 2) "arrange": 3 grobs
z cells name grob
1 1 (2-2,1-1) arrange gtable[arrange]
2 2 (2-2,2-2) arrange gtable[arrange]
3 3 (1-1,1-2) arrange text[GRID.text.9097]
Here we plot some information about how the p-values are correlated with the mean or the standard deviation. For these, within each quantile we should see an even distribution. We should see an encrichment for lower p-values if there is a set of genes differentially expressed and we see that as well.
Finally, we’ll take the sets of differentially expressed genes and do some work with them. We’ll attach some metadata about the genes to each gene and we’ll do some gene set enrichment analyses to see if certain sets of genes are differentially regulated.
There are NA differentially expressed genes. NA are higher in the growthcones compared to the soma and NA are lower in the growthcones compared to the soma.
Many of these differentially expressed genes we see only in the growthcone:
| baseMean | log2FoldChange | lfcSE | stat | pvalue | padj | symbol | |
|---|---|---|---|---|---|---|---|
| ENSMUSG00000019230 | 180.53565 | 9.878780 | 1.451184 | 6.807394 | 0.0000000 | 0.0000001 | Lhx9 |
| ENSMUSG00000061808 | 160.98345 | 9.707657 | 1.461727 | 6.641226 | 0.0000000 | 0.0000002 | Ttr |
| ENSMUSG00000029306 | 187.62623 | 9.011205 | 1.439243 | 6.261072 | 0.0000000 | 0.0000016 | Ibsp |
| ENSMUSG00000025927 | 125.02822 | 9.085669 | 1.512571 | 6.006771 | 0.0000000 | 0.0000046 | Tfap2b |
| ENSMUSG00000030787 | 157.28547 | 8.726159 | 1.454612 | 5.998959 | 0.0000000 | 0.0000046 | Lyve1 |
| ENSMUSG00000097248 | 114.30502 | 8.145578 | 1.498458 | 5.435972 | 0.0000001 | 0.0000957 | Gm2694 |
| ENSMUSG00000074141 | 114.08149 | 7.913042 | 1.567218 | 5.049101 | 0.0000004 | 0.0003898 | Il4i1 |
| ENSMUSG00000027555 | 68.16616 | 7.664327 | 1.541014 | 4.973560 | 0.0000007 | 0.0005050 | Car13 |
| ENSMUSG00000099021 | 59.25460 | 7.469163 | 1.611454 | 4.635045 | 0.0000036 | 0.0016248 | Rn7s1 |
| ENSMUSG00000019929 | 80.11998 | 7.024879 | 1.613057 | 4.355011 | 0.0000133 | 0.0038787 | Dcn |
| ENSMUSG00000028370 | 52.23069 | 6.948483 | 1.632888 | 4.255333 | 0.0000209 | 0.0050314 | Pappa |
| ENSMUSG00000038805 | 38.32846 | 6.985116 | 1.646536 | 4.242311 | 0.0000221 | 0.0051548 | Six3 |
| ENSMUSG00000013584 | 52.43797 | 6.840155 | 1.636036 | 4.180930 | 0.0000290 | 0.0062612 | Aldh1a2 |
| ENSMUSG00000097292 | 61.17950 | 6.799249 | 1.641647 | 4.141723 | 0.0000345 | 0.0067261 | A230107N01Rik |
| ENSMUSG00000022053 | 54.34164 | 6.737800 | 1.648701 | 4.086733 | 0.0000437 | 0.0075281 | Ebf2 |
| ENSMUSG00000057058 | 55.39098 | 6.706249 | 1.651297 | 4.061201 | 0.0000488 | 0.0080021 | Skap1 |
| ENSMUSG00000029697 | 38.05380 | 6.699841 | 1.659799 | 4.036538 | 0.0000542 | 0.0085115 | Fezf1 |
| ENSMUSG00000046352 | 41.33532 | 6.687360 | 1.656769 | 4.036385 | 0.0000543 | 0.0085115 | Gjb2 |
| ENSMUSG00000028487 | 25.83526 | 6.757819 | 1.678552 | 4.025981 | 0.0000567 | 0.0087185 | Bnc2 |
| ENSMUSG00000061132 | 48.65750 | 6.572985 | 1.661828 | 3.955275 | 0.0000764 | 0.0110179 | Blnk |
| ENSMUSG00000054044 | 43.26129 | 6.584997 | 1.669434 | 3.944448 | 0.0000800 | 0.0110477 | NA |
| ENSMUSG00000018698 | 33.07315 | 6.539978 | 1.662046 | 3.934896 | 0.0000832 | 0.0113686 | Lhx1 |
| ENSMUSG00000086382 | 58.42728 | 6.472519 | 1.650775 | 3.920896 | 0.0000882 | 0.0117880 | Chrna1os |
| ENSMUSG00000026726 | 29.08378 | 6.452672 | 1.672313 | 3.858531 | 0.0001141 | 0.0132673 | Cubn |
| ENSMUSG00000033794 | 67.28821 | 6.332658 | 1.649456 | 3.839240 | 0.0001234 | 0.0139188 | Lpcat2b |
| ENSMUSG00000059246 | 33.75503 | 6.442316 | 1.685811 | 3.821493 | 0.0001326 | 0.0145591 | Foxb1 |
| ENSMUSG00000060969 | 49.80389 | 6.324498 | 1.669439 | 3.788397 | 0.0001516 | 0.0161802 | Irx1 |
| ENSMUSG00000039109 | 28.28900 | 6.387730 | 1.701101 | 3.755056 | 0.0001733 | 0.0179026 | F13a1 |
| ENSMUSG00000066809 | 29.71418 | 6.306957 | 1.690107 | 3.731691 | 0.0001902 | 0.0188557 | Gm10180 |
| ENSMUSG00000034384 | 27.64094 | 6.292593 | 1.692159 | 3.718676 | 0.0002003 | 0.0195390 | Barhl2 |
| ENSMUSG00000087343 | 53.31006 | 6.173489 | 1.667612 | 3.701993 | 0.0002139 | 0.0198816 | 1700021N21Rik |
| ENSMUSG00000034362 | 39.97925 | 6.183420 | 1.703384 | 3.630080 | 0.0002833 | 0.0228647 | Csta1 |
| ENSMUSG00000097960 | 43.18918 | 6.136071 | 1.688677 | 3.633656 | 0.0002794 | 0.0228647 | A330074K22Rik |
| ENSMUSG00000086526 | 38.77529 | 6.052672 | 1.710329 | 3.538894 | 0.0004018 | 0.0276130 | Gm12762 |
| ENSMUSG00000058952 | 27.11991 | 6.085238 | 1.725598 | 3.526451 | 0.0004212 | 0.0279861 | Cfi |
| ENSMUSG00000065767 | 31.17731 | 5.994285 | 1.726391 | 3.472148 | 0.0005163 | 0.0317351 | Gm23849 |
| ENSMUSG00000053747 | 47.12911 | 5.845555 | 1.691006 | 3.456851 | 0.0005465 | 0.0332597 | Sox14 |
| ENSMUSG00000012819 | 27.31586 | 5.866267 | 1.742404 | 3.366766 | 0.0007606 | 0.0395985 | Cdh23 |
| ENSMUSG00000024810 | 20.78847 | 5.887744 | 1.759644 | 3.345986 | 0.0008199 | 0.0418220 | Il33 |
| ENSMUSG00000072789 | 42.95402 | 5.696333 | 1.704526 | 3.341887 | 0.0008321 | 0.0419225 | Gm10420 |
| ENSMUSG00000087575 | 22.66982 | 5.842407 | 1.755565 | 3.327936 | 0.0008749 | 0.0425490 | Gm12976 |
| ENSMUSG00000043286 | 27.03936 | 5.789444 | 1.748426 | 3.311231 | 0.0009289 | 0.0434164 | Pnpla1 |
| ENSMUSG00000031551 | 29.96437 | 5.737145 | 1.747520 | 3.283021 | 0.0010270 | 0.0460768 | Ido1 |
| ENSMUSG00000022949 | 12.59741 | 5.761572 | 1.787150 | 3.223888 | 0.0012646 | 0.0514771 | Clic6 |
| ENSMUSG00000078302 | 17.16787 | 5.615443 | 1.787999 | 3.140630 | 0.0016859 | 0.0602447 | Foxd1 |
| ENSMUSG00000030579 | 26.49493 | 5.489562 | 1.756698 | 3.124933 | 0.0017785 | 0.0627879 | Tyrobp |
| ENSMUSG00000036907 | 22.88630 | 5.440797 | 1.762057 | 3.087754 | 0.0020168 | 0.0657423 | C1ql2 |
| ENSMUSG00000046714 | 17.21059 | 5.443551 | 1.784652 | 3.050203 | 0.0022869 | 0.0706344 | Foxc2 |
| ENSMUSG00000033249 | 12.32118 | 5.493753 | 1.802617 | 3.047655 | 0.0023063 | 0.0710575 | Hsf4 |
| ENSMUSG00000040592 | 18.08584 | 5.409538 | 1.798048 | 3.008562 | 0.0026249 | 0.0766703 | Cd79b |
and only in the soma:
| baseMean | log2FoldChange | lfcSE | stat | pvalue | padj | symbol | |
|---|---|---|---|---|---|---|---|
| ENSMUSG00000097330 | 53.34832 | -7.736523 | 1.583414 | -4.885976 | 0.0000010 | 0.0007071 | Gm26672 |
| ENSMUSG00000090066 | 53.50288 | -7.686146 | 1.582836 | -4.855932 | 0.0000012 | 0.0007752 | 1110002E22Rik |
| ENSMUSG00000072720 | 41.37656 | -7.354547 | 1.614068 | -4.556529 | 0.0000052 | 0.0020785 | Myo18b |
| ENSMUSG00000078700 | 41.04595 | -7.357684 | 1.615338 | -4.554887 | 0.0000052 | 0.0020785 | D030028A08Rik |
| ENSMUSG00000031015 | 35.55094 | -7.031299 | 1.641255 | -4.284098 | 0.0000183 | 0.0046031 | Swap70 |
| ENSMUSG00000024076 | 32.49559 | -6.988414 | 1.647711 | -4.241286 | 0.0000222 | 0.0051548 | Vit |
| ENSMUSG00000076439 | 33.41303 | -6.843497 | 1.650961 | -4.145160 | 0.0000340 | 0.0067261 | Mog |
| ENSMUSG00000022297 | 35.96862 | -6.798469 | 1.664285 | -4.084919 | 0.0000441 | 0.0075281 | Fzd6 |
| ENSMUSG00000003680 | 28.86612 | -6.786555 | 1.667449 | -4.070022 | 0.0000470 | 0.0079161 | Taf6l |
| ENSMUSG00000043165 | 24.99438 | -6.452523 | 1.694638 | -3.807612 | 0.0001403 | 0.0151306 | Lor |
| ENSMUSG00000032590 | 25.58625 | -6.194192 | 1.674445 | -3.699250 | 0.0002162 | 0.0198816 | Apeh |
| ENSMUSG00000022562 | 23.06931 | -6.297498 | 1.704904 | -3.693755 | 0.0002210 | 0.0199988 | Oplah |
| ENSMUSG00000052911 | 25.76963 | -6.114397 | 1.676191 | -3.647792 | 0.0002645 | 0.0221250 | Lamb2 |
| ENSMUSG00000023953 | 20.41719 | -6.219618 | 1.719274 | -3.617583 | 0.0002974 | 0.0232836 | Polh |
| ENSMUSG00000036040 | 21.73913 | -6.164578 | 1.720371 | -3.583285 | 0.0003393 | 0.0251266 | Adamtsl2 |
| ENSMUSG00000074358 | 21.71465 | -6.113547 | 1.709010 | -3.577244 | 0.0003472 | 0.0254081 | Ccdc61 |
| ENSMUSG00000029918 | 19.35955 | -6.115846 | 1.730016 | -3.535139 | 0.0004076 | 0.0277308 | Mrps33 |
| ENSMUSG00000031907 | 17.83840 | -5.981244 | 1.742541 | -3.432484 | 0.0005981 | 0.0347170 | Zfp90 |
| ENSMUSG00000074576 | 16.93915 | -5.930997 | 1.749967 | -3.389205 | 0.0007010 | 0.0383108 | Mocs3 |
| ENSMUSG00000027932 | 17.13699 | -5.910825 | 1.749346 | -3.378877 | 0.0007278 | 0.0389007 | Slc27a3 |
| ENSMUSG00000001930 | 36.75336 | -5.756988 | 1.710475 | -3.365724 | 0.0007634 | 0.0395985 | Vwf |
| ENSMUSG00000024942 | 18.56127 | -5.873209 | 1.744486 | -3.366727 | 0.0007607 | 0.0395985 | Capn1 |
| ENSMUSG00000096929 | 18.80221 | -5.792251 | 1.751532 | -3.306963 | 0.0009431 | 0.0437507 | A330023F24Rik |
| ENSMUSG00000051235 | 17.12275 | -5.741189 | 1.758212 | -3.265356 | 0.0010933 | 0.0470265 | Gen1 |
| ENSMUSG00000031520 | 15.36256 | -5.760815 | 1.767026 | -3.260176 | 0.0011134 | 0.0473613 | Vegfc |
| ENSMUSG00000034471 | 18.35723 | -5.665601 | 1.743003 | -3.250483 | 0.0011521 | 0.0483368 | Caskin2 |
| ENSMUSG00000043872 | 15.55832 | -5.661474 | 1.770229 | -3.198159 | 0.0013831 | 0.0534661 | Zmym1 |
| ENSMUSG00000030588 | 21.06924 | -5.514464 | 1.742779 | -3.164178 | 0.0015552 | 0.0574220 | Yif1b |
| ENSMUSG00000022441 | 15.36754 | -5.595616 | 1.775193 | -3.152117 | 0.0016209 | 0.0587785 | Efcab6 |
| ENSMUSG00000022555 | 13.99173 | -5.554287 | 1.784903 | -3.111814 | 0.0018594 | 0.0633923 | Dgat1 |
| ENSMUSG00000031813 | 13.70398 | -5.555615 | 1.787200 | -3.108558 | 0.0018800 | 0.0633923 | Mvb12a |
| ENSMUSG00000044229 | 16.03543 | -5.519896 | 1.775394 | -3.109111 | 0.0018765 | 0.0633923 | Nxpe4 |
| ENSMUSG00000097921 | 13.96588 | -5.549020 | 1.785742 | -3.107404 | 0.0018874 | 0.0633923 | Gm26576 |
| ENSMUSG00000074506 | 15.00148 | -5.473133 | 1.783568 | -3.068643 | 0.0021503 | 0.0679540 | Gm10705 |
| ENSMUSG00000052544 | 14.04804 | -5.478582 | 1.788433 | -3.063342 | 0.0021888 | 0.0686400 | St6galnac3 |
| ENSMUSG00000061740 | 13.53015 | -5.460490 | 1.793256 | -3.045014 | 0.0023267 | 0.0713268 | Cyp2d22 |
| ENSMUSG00000023805 | 13.03244 | -5.457716 | 1.796544 | -3.037896 | 0.0023824 | 0.0724910 | Synj2 |
| ENSMUSG00000039427 | 14.29382 | -5.305392 | 1.781856 | -2.977454 | 0.0029065 | 0.0809339 | Alg1 |
| ENSMUSG00000028583 | 12.70846 | -5.343580 | 1.804119 | -2.961878 | 0.0030577 | 0.0838398 | Pdpn |
| ENSMUSG00000074715 | 12.04093 | -5.313293 | 1.810639 | -2.934486 | 0.0033410 | 0.0883503 | Ccl28 |
| ENSMUSG00000031337 | 12.49781 | -5.281565 | 1.808758 | -2.919995 | 0.0035004 | 0.0913588 | Mtm1 |
Many of these are not genes that are just expressed at a low level. Many of them are in the upper quartile of the expression levels, so this isn’t just a case of a gene getting missed because it is expressed at a low level. Here are the quartiles of baseMean:
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.2 11.1 67.2 343.0 245.8 916900.0